The interaction of genetics and physical activity in the pathogenesis of metabolic dysfunction associated liver disease

Genetic variants associated with increased liver fat and volume have been reported, but whether physical activity (PA) can attenuate the impact of genetic susceptibility to these traits is poorly understood. We aimed to investigate whether higher PA modify genetic impact on liver-related traits in the UK Biobank cohort. PA was self-reported, while magnetic resonance images were used to estimate liver fat (n = 27,243) and liver volume (n = 24,752). Metabolic dysfunction-associated liver disease (MASLD) and chronic liver disease (CLD) were diagnosed using ICD-9 and ICD-10 codes. Ten liver fat and eleven liver volume-associated genetic variants were selected and unweighted genetic-risk scores for liver fat (GRSLF) and liver volume (GRSLV) were computed. Linear regression analyses were performed to explore interactions between GRSLF/ GRSLV and PA in relation to liver-related traits. Association between GRSLF and liver fat was not different among lower (β = 0.063, 95% CI 0.041–0.084) versus higher PA individuals (β = 0.065, 95% CI 0.054–0.077, pinteraction = 0.62). The association between the GRSLV and liver volume was not different across different PA groups (pinteraction = 0.71). Similarly, PA did not modify the effect of GRSLF and GRSLV on MASLD or CLD. Our findings show that physical activity and genetic susceptibility to liver-related phenotypes seem to act independently, benefiting all individuals regardless of genetic risk.


Measurement of liver fat and liver volume
A large-scale multi-modal imaging study (n = 100,000) is ongoing in the UKB 18 .In the current study, we included a subset of the UKB cohort (n = 32,323) who have undergone neck-to-knee MRI scans, acquired with a Siemens 1.5 T MAGNETOM Aera using a dual-echo Dixon technique, resulting in water-fat volumes covering large parts of the body.The reference measurements of liver fat content, based on proton-density fat fraction (PDFF) maps, were only available for 9,893 subjects at the time of analysis 19 .The PDFF maps were based on a single transverse slice of the liver generated with a Siemens 1.5 T MAGNETOM Aera and a three-point Dixon technique 19 .
A neural network strategy was established and trained by Langner et al. 19 .First, the neural network was trained on the neck-to-knee MR images of those with reference measurements for regression of liver fat values.After tenfold cross-validation, the trained neural network measures were applied to the rest of the neck-to-knee MRI cohort for liver fat content inference.The data generated through neural network-based approach for liver fat content measures correlated well with the reference PDFF method (R 2 = 0.94) 19 .
A similar approach was applied to generate liver volume measures.Neck-to-knee MR images (only abdominal stations were considered) were used to computationally estimate liver volume using a deep learning approach originally established for kidney segmentation by Langner et al. 20 .Briefly, abdominal water-signal images were used in the segmentation process.Liver volume was estimated by multiplying the number of segmented voxels by the size of one voxel (liver volume = number of segmented voxels × size of 1 voxel) using 97 subjects.A segmentation model was trained on manual segmentations from these 97 subjects and was used in predicting liver volume in all neck-to-knee MRI scanned subjects.The model showed high accuracy against known manual segmentations (R 2 = 0.86).
In the UKB, the individuals that were deemed unrelated and had passed extensive quality control were included as described by Bycroft et al. 21.We further filtered for ethnic background and only Caucasians were included in the analyses.For the current analyses, liver fat content (n = 27,243) as well as liver volume (n = 24,752) measures were available in the UKB study participants with Caucasian ancestry.

Genetic risk score
The data collection, genotyping and quality control in the UKB study has previously been described in detail elsewhere 21 .All UKB participants have been genotyped.Briefly, the genotyping was performed on blood samples using one of two designed arrays (UK BiLEVE Axiom Array and UK Biobank Axiom Array) that share 95% of the genetic markers.The genotypes were further imputed using the Haplotype Reference Consortium and the UK10K haplotype resource 21 .In the present study, the imputed genotypes version 3 was used.
Recently, Liu et al. performed a GWAS study (n = 32,858) in relation to abdominal MRI derived phenotypes using UKB and discovered 12 genetic variants in association with liver fat content and 12 genetic variants in association with liver volume 5 .Only independent genetic variants were included in the current study, a similar strategy has been used previously 22 .Ten liver fat content and 11 liver volume associated genetic variants were thus extracted from the UKB genetic data and were recoded based on the liver fat content/volume increasing alleles (Table S2).
Unweighted genetic risk scores for each study participant were determined by aligning trait-increasing alleles, both for liver fat content (GRS LF ) as well as for liver volume (GRS LV ) and summing up the total number of risk alleles, using methods that have previously been described 23 .The GRS LF ranged from 2 to 18 while GRS LV ranged from 0 to 14 among the study participants.For genetic risk scores, imputed genetic variants with genotype dosage < 0.5 were recoded as 0, with genotype dosage of > 0.5 to ≤ 1.50 were recoded as 1 and with genotype dosage of > 1.50 were recoded as 2. We found that two of the liver fat content associated variants were in strong LD with liver volume associated variants (rs4665985 and rs1260326, R 2 = 0.34; rs58542926 and rs58489806, R 2 = 0.80) and so were excluded from the GRS LF and GRS LV in the sensitivity analyses.

Self-reported levels of physical activity
Questions derived from the validated International Physical Activity Questionnaire (IPAQ) short form were used to assess the weekly performance of walking, moderate PA and vigorous PA.For each type of PA, the participants were asked to estimate how many days on a typical week they spend at least 10 min doing each activity.Participants who reported an activity frequency of at least one day/week were further asked to report how many minutes on a typical day they spend doing the activity.Participants reporting a frequency of zero days/week for any of the activities were given a duration of zero.
The participants were asked to include walking at work, walking to and from work and walking for sports/ leisure when estimating frequency and time of walking.For moderate and vigorous PA, the participants were asked to include activities performed for work, leisure, travel and around the house.
The obtained questionnaire data was handled according to the IPAQ guidelines 24 .Individual IPAQ scores in metabolic equivalent of task (MET)-minutes/week were calculated for each type of PA by multiplying the frequency (days/week) with the typical duration (minutes) and an activity specific MET-value.Total MET-minutes/ week scores were calculated for each participant by summing up the weekly MET-minutes for walking, moderate and vigorous PA.The time variables were truncated at 180 min 24 .
Participants who answered "do not know" or "prefer not to answer" on either question were excluded.Those reporting "unable to walk", lacking data on either frequency or duration, or with a total activity time exceeding 960 min were also excluded.Further, reported durations of less than 10 min were changed to zero 24 .After the exclusions 23,080 and 20,986 participants were included in the liver fat and liver volume cohort, respectively.
The participants were divided into three PA groups (low, moderate, and high) based on the categorical score criteria stated in the IPAQ guidelines (Text S1) 24 .All exclusions made are presented in Figs.S1 and S2.

Other lifestyle measures
Age at baseline/recruitment was truncated to whole years.The sex of each participant was acquired from the central registry at recruitment.Both information regarding genotyping array and population substructure (first 20 genetic principal components) were obtained with the genomic UKB data.The BMI values were constructed from height and weight measured during the initial Assessment Centre visit.If either height or weight was omitted no BMI value was constructed.
The participants were asked to report their baseline smoking status (never, previous, current or prefer not to answer) in a touchscreen questionnaire.Participants were also asked to report their alcohol consumption status and estimate their current alcohol intake frequency.Individuals reporting an alcohol intake frequency of at least once or twice a week were asked to estimate an average weekly consumption of a variety of alcoholic beverages.These measures were used to estimate an average weekly alcohol intake, from which an estimated daily consumption was derived 25 .The Townsend deprivation index was calculated prior to recruitment to the UKB based on the preceding national census output areas.Each participant received a score based on the geographical area determined by their postcode.All analyses were performed as complete case analysis, the covariates were not imputed, number of individuals included in each model are reported in respective tables.

Assessment of MASLD and chronic liver disease
Cases of MASLD and CLD were defined based on hospital health outcome codes (ICD-9 and ICD-10 codes) (Table S3).Diagnosis across all the participants hospital inpatient records were part of the dataset, including diagnoses both before and after imaging data collection i.e. prevalent and incident cases.
Self-reported cases of CLD at a nurse's interview were also included (Table S3) 25 .If the participants were uncertain about their illness, the interviewer, a trained nurse, tried to classify it based on their description.Any illnesses that the nurse could not code were recorded as a free-text description, which was later reviewed by a doctor for coding the illness or marked it as "unclassifiable".The MASLD/CLD cohort included n = 239,308 individuals from which n = 172 cases of MASLD and n = 371 cases of CLD were identified.

Statistical analyses
All statistical analyses were performed using Stata (version 15, StataCorp, College Station, TX, USA).Both liver fat content and liver volume variables had skewed distributions and were transformed using rank-based inversenormal transformation.Linear regression analyses were performed to assess the association of GRS and PA with liver fat content/volume, assuming an additive effect.Interaction analyses for GRSs and PA were performed by introducing an interaction term (GRS × PA) in the regression models, along with the main effect terms.All analyses including genotype as a variable were performed while adjusting for (a) basic model covariates: age, sex, genotyping array, and population substructure (first 20 principal components), (b) main model covariates, i.e., the basic model covariates as well as smoking status (never/previous/current), alcohol consumption (g/ day), and Townsend deprivation index, and (c) main model covariates and BMI.Analyses of the effect of PA on liver fat content/liver volume (lacking genotype as a variable) were adjusted for main model covariates except genotyping array and population substructure.
To evaluate the effect of PA on GRS LF and GRS LV in relation to MASLD and CLD, linear regression analyses with robust standard errors were performed.We considered this the primary analysis instead of the more standard logistic regression since we want to measure interaction on an additive scale.This scale is more relevant when investigating which subgroups would benefit most from PA from a public health perspective 26 .Beta coefficients and 95% confidence intervals (CIs) limits were multiplied by 100 to express change in percentage points and the results are presented as such.
Logistic regression analyses were also performed to assess the effect of PA on GRS LF and GRS LV in predicting MASLD and CLD.Interaction analyses were performed by introducing an interaction term (GRS × PA) in the logistic regression models, along with the main effect terms.The analyses were adjusted according to the models listed above.

Ethics approval statement
The UKB has ethical approval from the North West Multi-Centre Research Ethics Committee (ref: 11/NW/0382) and informed written consent from all participants prior to the study.The current study was further approved by the Swedish central ethics committee (diary number 2019-03073).

Patient consent statement
All UK Biobank participants gave consent at recruitment.

Characteristics of the study population
The baseline characteristics of the study participants with liver fat content (n = 23,080) and liver volume (n = 20,986) measures are reported according to the level of PA in Table 1.In both the liver fat content and the liver volume sample, more physically active groups generally had lower BMI, triglycerides, C-reactive protein, hemoglobin A1c, alkaline phosphatase and alanine aminotransferase compared to the less active groups.The most active group in both the liver fat content and the liver volume cohort had higher alcohol consumption than the less active groups.The mean liver fat percentage was below the threshold for MASLD (≥ 5%) in all three PA groups.The lowest liver volume measures were observed in the moderately active group whereas the highest liver volume measures were observed in the most active group.

Genetic predisposition and physical activity in relation to liver fat content and liver volume
Among individuals with liver fat content (n = 23,080) measures, GRS LF was positively associated with liver fat content (β = 0.058, 95% CI 0.052-0.065,p-value < 0.001) after adjusting for age, sex, genotyping array, and population substructure.The GRS LF explained 5.3% (R 2 = 0.053) of the liver fat content phenotypic variance.Higher PA was associated with lower liver fat content (β = − 0.143, 95% CI − 0.164 to − 0.121, p-value < 0.001) when the regression model was adjusted for main model covariates except genotyping array and population substructure.
In the sensitivity analysis, similar results were observed after excluding liver fat content associated genetic variants which were in LD with liver volume associated genetic variants (Table S4).
In the liver volume cohort (n = 20,986), higher GRS LV was associated with higher liver volume (β = 0.066, 95% CI 0.059-0.073,p-value < 0.001) after adjusting for age, sex, genotyping array, and population substructure.The GRS LV explained 1.6% (R 2 = 0.016) of the phenotypic variance in the liver volume trait.Physical activity was not associated with liver volume (β = -0.018,95% CI − 0.040 to 0.004, p-value = 0.12) when the analyses were adjusted for main model covariates except genotyping array and population substructure.
Higher GRS LF was associated with a higher likelihood for CLD in all three PA groups.Although the strength of association appeared greater in individuals in the low PA group the interaction results were not statistically significant in any of the adjusted models (Table 4).When these analyses were repeated with logistic regression, GRS LF was not significantly associated with higher odds of MASLD (p interaction = 0.09) or CLD (p interaction = 0.91) when adjusting for main model covariates (Table S6).
A higher GRS LV was associated with higher odds of MASLD (OR = 1.14, 95% CI 1.05-1.24,p-value = 0.001) but not CLD (OR = 1.04, 95% CI 0.98-1.10,p-value = 0.194) when adjusting for age, sex, genotyping array, and population substructure.However, no associations between GRS LV and MASLD or CLD were detected among any of the PA groups when adjusting for main model covariates, and the interaction terms were not significant.This was the case for analyses based on linear regression with robust standard errors (Table 5) and analyses based on logistic regression (Table S7).
In the sensitivity analysis, we excluded two pairs of genetic variants (rs4665985 and rs1260326, rs58542926 and rs58489806) that were in LD for liver fat and liver volume-associated traits.We found that the results remained materially similar in all analyses (Tables S8-S11).

Discussion
To our knowledge, this is the first large-scale epidemiological study investigating interactions between PA and genetic predisposition to objectively measured liver fat content and liver volume in predicting respective traits.We found that the genetic effects on these traits were not modified by PA.Furthermore, we observed that PA did not modify the impact of liver associated genetic predisposition on the likelihood of having a MASLD or CLD diagnosis.
A previous study has demonstrated that higher PA can mitigate the impact of genetic predisposition to higher alanine aminotransferase (ALT) levels on measured ALT, which served as a proxy for MASLD 15 .In contrast, we examined MRI-derived measurements of liver fat content and liver volume inferred through a neural network-based approach instead of ALT levels.While serum biomarkers like ALT are practical for large-scale Table 4.The association of liver fat content associated genetic variants with MASLD and CLD (N = 239,308).The basic model (n = 239,308) was adjusted for age, sex, first 20 genetic principal components and genotyping array.Main model (n = 173,387) covariates include basic model covariates + alcohol consumption (g/day), smoking status and Townsend deprivation index.Main + BMI model (n = 173,029) covariates include those of the main model and body mass index (BMI).MASLD; metabolic dysfunction associated steatotic liver disease, CLD; chronic liver disease.*Betas and 95% CI limits were multiplied by 100 and reported as percentage.www.nature.com/scientificreports/epidemiological studies due to their routine assessments and good inter-laboratory reproducibility, they have certain limitations.For example, they lack specificity for the liver and adjusting for comorbid conditions leading to elevated ALT levels can be complex 17,27 .Further, around 19% of European MASLD patients have normal ALT values, so, a large proportion of MASLD cases are likely excluded by using ALT levels as a proxy for MASLD 28 .
The non-invasive MRI-PDFF measures have been shown to correlate well with gold-standard histologydetermined steatosis in patients with MASLD (p-value > 0.0001) 29 .By utilizing deep learning, a larger population than those with MRI-PDFF measures in the UKB could be included in our study 19,30 .To further enhance the clinical relevance of our study, we also examined diagnoses of MASLD and CLD, as determined by healthcare professionals at hospitals.Despite its clinical relevance, relying on diagnoses set by healthcare professionals may exclude cases of MASLD and CLD, as not all individuals seek medical care.Furthermore, the healthy volunteer bias in cohorts like UKB comprising the generalizability in addition to lack of external validity is well known 31 .The healthy volunteer bias likely contributed to the substantially lower prevalence of MASLD and CLD in our cohort compared to the general population.This bias also resulted in generally healthy liver fat content and liver volume study populations, based on biomarker characteristics.
Large sample sizes are needed to identify interactions, especially if the effect estimates are of low magnitude 32,33 .However, the beta estimates presented here were similar across groups, indicating no clinically significant interactions being present.The only exception to this was individuals in the high PA group having higher odds for MASLD compared to those in the moderate and low PA groups.This could be due to reverse causation, the limitation of including both prevalent and incident cases.Individuals who increase their PA after diagnosis could cause overinflation of odds in the high PA group.In this scenario, the higher odds ratios do not reflect a true elevation of risk as the cases are prevalent.A recent prospective study demonstrated that PA did not attenuated the genetic effect in relation to MASLD 16 .PA was assessed similarly as in our study using IPAQ while PA groups were constructed based on MET values and not IPAQ categorical score criteria as in the present study.Even though, only incident cases were included and the frequency of those were more in line with the expected no significant interactions between PA and genetic risk were identified among individuals with high genetic risk 16 .Therefore, further larger studies with incident cases of MASLD and CLD are required to examine the potential effect of PA on genetic risk for these outcomes in more depth.Future studies should also explore interactions between liver-associated genetic risk and PA in the more severe and progressive forms of MASLD namely, metabolic dysfunction associated steatohepatitis (MASH) and MASH with fibrosis.
Our results suggest that PA interventions should be recommended to everyone with MASLD, given that we observed that higher physical activity has a protective effect in relation to liver fat content, liver volume, MASLD and CLD, in line with previous studies.However, targeting PA interventions based on current liver associated genetic information does not appear to increase efficiency as no interaction effects were identified.
Strengths of the present study include deeply phenotyped liver fat content and liver volume measures inferred through deep learning, which were strongly correlated with the UKB based MRI reference measures 19 .The categories of PA we used are beneficial for interpretation of the results, since the most active group reaches the recommendations for health enhancing PA 24,34 .Furthermore, we considered potential confounding factors including alcohol consumption, BMI, and material deprivation (Townsend deprivation index).The genetic variants used to construct the genetic risk scores (GRS LF and GRS LV ) were reported to associate with liver fat content and liver volume, respectively, in a previous GWAS 5 .Several of the genetic variants used are shared between studies with reported associations with MASLD, liver fat, liver disorders and metabolic traits, supporting our use of these genetic variants 7,8,[35][36][37][38][39] .Since the original GWAS was conducted within the UKB data there was a risk that effect estimates for the genetic component would be overestimated, compared with an independent sample.We therefore created unweighted genetic risk scores to reduce this potential overestimation of effects.Our study also had several limitations.Physical activity assessment by wrist-worn accelerometer data is available for approximately 100,000 UKB participants, to date.Unfortunately, the accelerometer cohort did not substantially overlap with the liver fat content and liver volume study samples.As such, we decided to use questionnaire-based measures of PA to improve statistical power.The IPAQ short form has previously showed acceptable results for both reliability and validity 40,41 .However, self-reported measures of PA are subjective and prone to measurement errors, including misreporting or cognitive difficulties recalling performed activities 41 .As being physically active is recommended and to some extent socially rewarded, less active individuals might report higher levels of activity than they in fact performed.Misreporting due to cognitive difficulties should on the other hand be more random.
While the UKB participants were recruited during 2006-2010, the imaging study did not start until 2014, meaning that the baseline PA data used in the present study was collected some years before the participants underwent the radiological examinations.As the PA and MRI-PDFF measures were not collected simultaneously, changes in behavior or liver health might have occurred between the assessment periods.However, the advantage is the reduced risk of reverse causality.
In conclusion, we did not observe that PA modified the effect of genetic predisposition on elevated liver fat content and liver volume, measured without a human operator.Our results suggest that targeting PA interventions by using current knowledge of liver associated genetic risk is not likely to improve public health nor offer a route to more cost-efficient healthcare.

Figure 1 .
Figure 1.Association between the liver fat associated genetic risk score (GRS LF ) and liver fat content across three levels of physical activity.The participants were divided into low and high genetic risk based on the median value (Median = 10.016).The two GRS groups were significantly associated with liver fat content (p-value < 0.001).The error bars represent 95% CIs.

Figure 2 .
Figure 2. Association between the liver volume associated genetic risk score (GRS LV ) and liver volume across three levels of physical activity.The participants were divided into low and high genetic risk based on the median value (Median = 6.0).The two GRS groups were significantly associated with liver fat content (p-value < 0.001).Error bars represent 95% CIs.

Table 1 .
Characteristics of the UK Biobank participants included in the liver fat content and liver volume study populations across three levels of physical activity.Data is presented as mean (standard deviation) or otherwise specified.

Table 3 .
Association between GRS LV and liver volume across different levels of PA (n = 20,986).Basic model (n = 20,986) covariates include genotyping array, first 20 genetic principal components, age, and sex.Main model (n = 16,466) covariates include basic model covariates + alcohol consumption (g/day), smoking status and Townsend deprivation index.Main + BMI model (n = 16,305) covariates include those of the main model and body mass index (BMI).GRS LV liver volume associated genetic risk score.

Table 5 .
The association of liver volume associated genetic variants with MASLD and CLD (n = 239,308).The basic model (n = 239,308) was adjusted for age, sex, first 20 genetic principal components and genotyping array.Main model (n = 173,387) covariates include basic model covariates + alcohol consumption (g/day), smoking status and Townsend deprivation index.Main + BMI model (n = 173,029) covariates include those of the main model and body mass index (BMI).MASLD; metabolic dysfunction associated steatotic liver disease, CLD; chronic liver disease.*Betas and 95% CI limits were multiplied by 100 and reported as percentage.